Identi cation of Case, Digits and Special Symbols Using a Context Window

نویسندگان

  • Tin Kam Ho
  • George Nagy
چکیده

We present strategies and results for identifying the symbol type of every character in a text document. Assuming reasonable word and character segmentation for shape clustering, we designed several type recognition methods that depend on cluster n-grams, characteristics of neighbors, and within-word context. On an ASCII test corpus of 925 articles, these methods represent a substantial improvement over default assignment of all characters to lower case.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

SECURING INTERPRETABILITY OF FUZZY MODELS FOR MODELING NONLINEAR MIMO SYSTEMS USING A HYBRID OF EVOLUTIONARY ALGORITHMS

In this study, a Multi-Objective Genetic Algorithm (MOGA) is utilized to extract interpretable and compact fuzzy rule bases for modeling nonlinear Multi-input Multi-output (MIMO) systems. In the process of non- linear system identi cation, structure selection, parameter estimation, model performance and model validation are important objectives. Furthermore, se- curing low-level and high-level ...

متن کامل

Nonlinear system identification using higher order statistics

A general formula is given for the conditional mean in terms of higher order statistics. Using this formula, a general scheme for nonlinear system identi cation is introduced including a broad range of nonlinearities which depends on the probability density function of the input. As a special case of that general scheme, the polynomial system identi cation problem is treated. It is shown that o...

متن کامل

Pii: S0165-1684(01)00096-2

A subspace based blind channel identi&cation algorithm using only the fact that the received signal can be oversampled is proposed. No direct use is made in this algorithm of either the statistics of the input sequence or even of the fact that the symbols are from a &nite set and therefore this algorithm can be used to identify even channels in which arbitrary symbols are sent. Using this algor...

متن کامل

Open-loop worst-case identi"cation of nonSchur plants

This paper presents an LMI based algorithm for deterministic worst-case identi"cation of nonSchur plants in an open-loop setting. Contrary to other approaches dealing with this problem, the proposed technique does not require prior knowledge of a stabilizing controller. The main result of the paper shows that, as the information is completed, the identi"ed model converges, in the ‘2-induced top...

متن کامل

Maximum-likelihood blind FIR multi-channel estimation with Gaussian prior for the symbols

We present two approaches to stochastic Maximum Likelihood identi cation of multiple FIR channels, where the input symbols are assumed Gaussian and the channel deterministic. These methods allow semi-blind identi cation, as they accommodate a priori knowledge in the form of a (short) training sequence and appears to be more relevant in practice than purely blind techniques. The two approaches a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001